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(57) Abstract 

An apparatus for encoding a digital video signal (217) to reduce a transmission rate of the digital video signal, which comprises a 
feature point based motion compensation circuit (150) for selecting a set of feature points from the reconstructed reference frame to detect 
a set of motion vectors between a current frame and an original reference frame corresponding to the set of feature points by using a 
feature point based motion estimation, and for generating a second predicted frame based on the set of motion vectors and the reconstructed 
reference frame. The feature point based motion estimation employs a convergence process in which a displacement of each of the feature 
points are given to a motion vector thereof and the six triangles of each of the hexagon are affine-transformed independently using the 
displacements of their vertex feature points. If the displacement provides a better peak signal to noise ratio (PSNR), the motion vector of 
the subject feature point is sequentially updated. Therefore, the inventive convergence process is very efficient in the matching process to 
determine the predicted image as close as possible to the original image having zooming, rotation or scaling of objects. 
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METHOD AND APPARATUS FOR ENCODING A VIDEO SIGNAL 
USING FEATURE POINT BASED MOTION ESTIMATION 

5 TECHNICAL FIELD OF THE INVENTION 

The present invention relates to a method and 
apparatus for encoding a video signal; and, more 
particularly, to a method and apparatus for encoding a 
10 digital video signal using an improved feature point based 
motion estimation, thereby effectively reducing the 
transmission rate of the digital video signal with a good 
picture quality. 

15 BACKGROUND ART 

As is well known, transmission of digitized video 
signals can attain video images of a much higher quality 
than the transmission of analog signals, Whein an image 

20 signal comprising a sequence of image "frames" is 
expressed in a digital form, a substantial amount of data 
is generated for transmission, especially in the case of 
a high definition television system. Since, however, the 
available frequency bandwidth of a conventional 

25 transmission channel is limited, in order to transmit the 
substantial amounts of digital data therethrough, it is 
inevitable to compress or reduce the volume of the 
transmission data. Among various video compression 
techniques, the so-called hybrid coding technique, which 

30 combines temporal and spatial compression techniques 
together with a statistical coding technique, is known to 
be most effective* 

Most hybrid coding techniques employ a motion 
compensated DPCM(dif f erential pulse coded modulation) , 

35 two-dimensional DCT(discrete cosine transform), 
quantization of DCT coefficients, and VLC(variable length 
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coding). The motion compensated DPCM is a process of 
estimating the movement of an object between a current 
frame and a previous frame or future frame, i.e., a 
reference frame, and predicting the current frame 
5 according to the motion flow of the object to produce a 
differential signal representing the difference between 
the current frame and its prediction. This method is 
described, for example, in Staffan Ericsson, "Fixed and 
Adaptive Predictors for Hybrid Predictive/Transform 

10 Coding", IEEE Transactions on Communications , COM- 3 3 , No. 

12 (December 1985); and in Ninomiya and Ohtsuka, "A Motion- 
Compensated Interframe Coding Scheme for Television 
Pictures", IEEE Transactions on Communications , COM- 30 , 
No. 1 (January 1982). 

15 The two-dimensional DCT , which reduces or makes use 

, of spatial redundancies between image data, converts a 

block of digital image data, for example, a block of 8x8 
pixels, into a set of transform coefficient data. This 
technique is described in Chen and Pratt, "Scene Adaptive 

20 Coder", IEEE Transactions on Communications , COM- 3 2 , No* 
3 (March 1984). By processing such transform coefficient 
data with a quantizer, zigzag scanning, and VLC, the 
amount of data to be transmitted can be effectively 
compressed . 

25 Specifically, in the motion compensated DPCM, current 

frame data is predicted from the corresponding reference 
frame data based on an estimation of the motion between 
the current and a reference frames. Such estimated motion 
may be described in terms of two dimensional motion 

30 vectors representing the displacement of pixels between 
the reference and the current frames. 

There have been two basic approaches to estimate the 
displacement of pixels of an object: one is a block-by- 
block estimation and the. other is a pixel-by-pixel 

35 approach . 

In the block-by-block motion estimation, a block in 
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a current frame is compared with blocks in its reference 
frame until a best match is determined. From this, an 
interframe displacement vector (which indicates how much 
the block of pixels has moved between frames) for the 
whole block can be estimated for the current frame being 
transmitted. 

Such block matching technique may be employed in 
predicting P and B frames included in video sequences, as 
disclosed in ITU Telecommunication Standardization Sector 
Study Group 15, Working Party 15/1 Expert's Group on Very 
Low Bit Rate Visual Telephony , "Video Codec Test Model, 
TMN4 Revl", (October 25, 1994), wherein a P or predictive 
frame denotes a frame which is predicted from its previous 
frame (as the reference frame) while a B or 
bidirectionally predictive frame is predicted from its 
previous and future frames (as the reference frame). In 
coding the so-called B frame, in particular, a 
bidirectional motion estimation technique is employed in 
order to derive forward and backward displacement vectors, 
wherein the forward displacement vector is obtained by 
estimating the movement of an object between a B frame and 
its previous intra(I) or predictive(P) frame (as the 
reference frame) and the backward displacement vector is 
derived based on the B frame and its future I or P frame 
(as the reference frame). 

However, in the block-by-block motion estimation, 
blocking effects at the boundary of a block may occur in 
a motion compensation process; and poor estimates may 
result if all pixels in the block do not move in a same 
way, to thereby decrease the overall picture quality. 

Using a pixel-by-pixel approach, on the other hand, 
a displacement is determined for each and every pixel. 
This technique allows a more exact estimation of the pixel 
value and has the ability to easily handle scale changes 
(e.g., zooming, movement perpendicular to the image 
plane) . However, in the pixel-by-pixel approach, since a 
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motion vector is determined at each and every pixel, it is 
virtually impossible to transmit all of the motion vectors 
to a receiver. 

One of the techniques introduced to ameliorate the 
5 problem of dealing with the surplus or superfluous 
transmission data resulting from the pixel-by-pixel 
approach is a feature point based motion estimation 
method. 

In the feature point based motion estimation 

10 technique, motion vectors for a set of selected pixels, 
i.e., feature points, are transmitted to a receiver, 
wherein each of the feature points is defined as a pixel 
capable of representing its neighboring pixels so that 
motion vectors for non-feature points can be recovered or 

15 approximated from those of the feature points in the 
receiver. In an encoder which adopts the motion 
estimation based on feature points, disclosed in a 
copending commonly owned application, U.S. Ser. No. 
08/367,520, entitled "Method and Apparatus for Encoding a 

20 Video Signal Using Pixel-by-Pixel Motion Estimation", a 
number of feature points are first selected from all of 
the pixels contained in the previous frame. Then, motion 
vectors for the selected feature points are determined, 
wherein each of the motion vectors representing a spatial 

25 displacement between one feature point in the previous 
frame and a corresponding matching point, i.e., a most 
similar pixel, in the current frame. Specifically, the 
matching point for each of the feature points is searched 
in a search region within the current frame by using a 

30 known block matching algorithm, wherein a feature point 
block is defined as a block surrounding the selected 
feature point; and the search region is defined as a 
region within a predetermined area which encompasses the 
position of the corresponding feature point. 

35 in this case, it would be most desirable or 

convenient to find only one best matching feature point 
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block over the entire search region corresponding to the 
selected feature point. Sometimes, however, there may be 
a plurality of equivalent best matching feature point 
block found during the feature point matching- As a 
5 result, it is difficult to correctly detect a motion 
vector for the feature point with such correlation between 
the feature point block and the corresponding search 
region. Furthermore, poor estimates may result if the 
search region is not determined in accordance with the 
10 spatial displacement between the feature point in the 
reference frame and a corresponding matching point, i.e., 
a most similar pixel, in the current frame, to thereby 
deteriorate the overall picture quality. 

15 DISCLOSURE OF THE INVENTION 

It is, therefore, an object of the invention to 

provide a method for effectively estimating motion vectors 

for the feature points, thereby effectively reducing the 
20 transmission rate of the digital video signal with a good 

picture quality. 

Another object of the invention is to provide an 

apparatus, for use in a video signal encoding system, for 

effectively estimating motion vectors employing a feature 
25 point based motion estimation, thereby effectively 

reducing the transmission rate of the digital video signal 

with a good picture quality. 

Another object of the invention is to provide a video 

signal encoding system selectively employing an feature 
30 point based motion estimation and a block based motion 

estimation, to thereby effectively improve the overall 

picture quality. 

In accordance with one aspect of the present 

invention, there is provided a method for detecting a set 
35 of motion vectors between a current frame and a reference 

frame of video signals by employing a feature point based 



WO 96/29828 PCT/KR95/00050 



motion estimation approach, wherein the reference frame 
includes a reconstructed reference frame and an original 
reference frame, which, comprises the steps of: 

(a) selecting a set of feature points from pixels 
5 contained in the reconstructed reference frame wherein the 

set of feature points forms a polygonal grid having a 
plurality of overlapping polygons; 

(b) determining a set of quasi-f eature points on the 
current frame based on the set of feature points; 

10 (c) assigning a set of initial motion vectors for the 

set of quasi-f eature points, wherein each of the initial 
motion vectors is set to (0,0); 

(d) appointing one of the quasi-f eature points as a 
subject quasi-f eature point, wherein the subject quasi- 

15 feature point has N number of neighboring quasi-f eature 
points which form a subject current polygon defined by 
line segments connecting the subject quasi-f eature point 
and said N number of neighboring quasi-f eature points, N 
being a positive integer; 

20 (e) sequentially adding the initial motion vector of 

the subject quasi-f eature point to M number of candidate 
motion vectors with to produce M number of updated initial 
motion vectors, M being a positive integer, wherein said 
M number of candidate motion vectors cover a predetermined 

25 region in the subject current polygon and the initial 
motion vectors of said neighboring feature points are 
fixed; 

(f ) determining a predicted position on the original 
reference frame for each pixel contained in the subject 

30 current polygon based on each of the M number of updated 
initial motion vectors for the subject quasi-f eature point 
and said N number of the initial motion vectors of the 
neighboring quasi-f eature points; 

(g) providing a predicted pixel value for said each 
35 pixel based on the predicted position from the original 

reference frame to form M number of predictive subject 
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current polygons; 

(h) calculating the difference between the current 
polygon and each of the predicted subject current polygons 
to produce M number of peak signal to noise ratios (PSNR' s ) 
5 (i) selecting one of the updated motion vectors as a 

selected updated motion vector, which entails a predicted 
subject current polygon having a maximum PSNR, to update 
the initial motion vector of the subject quasi-f eature 
point with the selected updated motion vector; 

10 ( j) repeating the steps (d) to (i) until all of the 

initial motion vectors are updated; 

(k) repeating the step (j) until said repeating is 
carried out for a predetermined number of times; and 

(n) establishing the set of initial vectors as the 

15 set of motion vectors, to thereby determine the set of 
motion vectors. 

In accordance with another aspect of the present 
invention, there is provided an apparatus, for use in a 
video encoding system, for detecting a set of motion 

20 vectors between a current frame and a reference frame of 
video signals by employing a feature point based motion 
estimation, wherein the reference frame includes a 
reconstructed reference frame and an original reference 
frame, which comprises: 

25 first selection means for selecting a set of pixels 

from the reconstructed reference frame as a set of 
feature points, wherein the set of feature points forms a 
polygonal grid having a plurality of overlapping polygons; 
means for determining a set of quasi-f eature points 

30 on the current frame corresponding to the set of feature 
points ; 

memory means for storing a set of initial motion 
vectors for the set of quasi-f eature points, wherein each 
of the initial motion vectors is set to (0,0); 
35 second selection means for selecting L number of 

subject quasi-f eature points from the set of quasi-f eature 
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points, wherein each of the subject quasi-f eature points 
has N number of neighboring quasi-f eature points which 
form a non-overlapping subject current polygon defined by 
line segments connecting the subject quasi-f eature point 
5 and said N number of neighboring quasi-f eature points, 
said L and N being positive integers; 

adder means for adding the initial motion vector of 
each of the subject quasi-f eature points to M number of 
candidate motion vectors to generate M number of updated 

10 initial motion vectors for each of the subject quasi- 
f eature points, M being a positive integer, wherein said 
M number of candidate motion vectors cover a predetermined 
region in each of the non-overlapping subject current 
polygons and the initial motion vectors of the neighboring 

15 feature points for each of the subject quasi-f eature 
points are fixed; 

means for determining a predicted position on the 
original reference frame for each pixel contained in each 
of the non-overlapping subject current polygons based on 

20 each of the updated initial motion vectors and the initial 
motion vectors of the corresponding neighboring quasi- 
f eature points; 

means for obtaining a predicted pixel value from the 
original reference frame based on the predicted position 

25 to thereby form M number of predicted subject current 
polygons for each of the non-overlapping subject current 
polygons; 

means for calculating the differences between each of 
the non-overlapping subject current polygons and the 
30 corresponding M number of predicted subject current 
polygons to produce M number of peak signal to noise 
ratios (PSNR' s > for each of the non-overlapping subject 
current polygons; 

third selection means for selecting one of the 
35 updated initial vectors, for each of the subject quasi- 
f eature points, as a selected updated initial motion 
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vector which entails the predicted subject current polygon 
having a maximum PSNR to produce L number of selected 
updated initial motion vectors; 

means for updating the initial motion vector for each 
5 of the subject quasi-f eature points stored in the memory 
means with the corresponding selected updated initial 
motion vector; and 

means for retrieving the set of initial motion 
vectors from the memory means as the set of motion vectors 
10 when all of the initial motion vectors are updated by a 
predetermined number of times* 

In accordance with another aspect of the present 
invention, there is provided an apparatus for encoding a 
digital video signal to reduce the transmission rate of 
15 the digital video signal, said digital video signal having 
a plurality of frames including a current frame and a 
reference frame, which comprises: 

first memory means for storing a reconstructed 
reference frame of the digital video signal; 
20 second memory means for storing an original reference 

frame of the digital video signal; 

first motion compensation means for detecting a 
number of motion vectors between the current frame and the 
reconstructed reference frame by using a block based 
2 5 motion estimation and for generating a first predicted 
current frame based on the number of motion vectors and 
the reconstructed reference frame; 

second motion compensation means for selecting a set 
of feature points from the reconstructed reference frame 
30 to detect a set of motion vectors between the current 
frame and the original reference frame corresponding to 
the set of feature points by using a feature point based 
motion estimation, and for generating a second predicted 
frame based on the set - of motion vectors and the 
35 reconstructed reference frame; 

means for selectively providing the number of motion 
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vectors and the first predicted current frame or the set 
of motion vectors and the second predicted current frame 
as selected motion vectors and the predicted current 
frame; 

5 means for transform-coding an error signal 

representing the difference between the predicted current 
frame and the current frame to produce a transform coded 
error signal; and 

means for statistically coding the transform coded 
10 error signal and the selected motion vectors to produce an 
encoded video signal to be transmitted. 

BRIEF DESCRIPTION OF THE DRAWINGS 

15 The above and other objects and features of the 

present invention will become apparent from the following 
description of preferred embodiments given in conjunction 
with the accompanying drawings, in which: 

Fig. 1 is an image signal encoding apparatus having 
2 0 a feature point based motion compensation device in 
accordance with the present invention; 

Figs- 2A and 2B depict schematic diagrams 
illustrating two frame sequence. 

Fig. 3 shows a detailed block diagram of the motion 
25 compensation device shown in Fig. 1; 

Fig. 4 exhibits an exemplary block diagram of the 
motion vector search block illustrated in Fig. 3; . 

Figs. 5A and 5B offer an exemplary diagram of the 
current frame and the reconstructed previous frame; 
30 Figs. 6A to 6E describe exemplary diagrams for 

showing the feature point selecting operation in 
accordance with the present invention; and 

Figs. 7A and 7B illustrate an explanatory diagram 
describing the motion vector searching process in 
35 accordance with the present invention. 
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MODES OF CARRYING OUT THE INVENTION 

Referring to Fig, 1/ there is shown a block diagram 
of an image encoding system in accordance with the present: 
5 invention. The image encoding system comprises a frame 
reordering circuit 101, a subtractor 102, an image signal 
encoder 105, an image signal decoder 113, an adder 115, a 
first memory device 120, a second memory device 130, an 
entropy coder 107 and a motion compensation device 150.. 

10 An input digital video signal includes two frame(or 

picture) sequences as shown in Figs. 2A and 2B: a first 
frame sequence is provided with one intra(I) frame, II, 
three bidirectionally predictive frames, Bl, B2, B3, and 
three predictive frames, PI, P2, P3; and a second frame 

15 sequence has one intra(I) frame, II, three forwardly 
predictive frames, Fl, F2, F3, and three predictive 
frames, PI, P2, P3 . Therefore, the. image coding system 
includes two sequence coding modes: a first sequence 
coding mode and a second sequence coding mode. In the 

20 first sequence coding mode, a line L17 is coupled to the 

line- 11 by a f irst - switch 103 and ..the. f irst f rame sequence 

which includes II, Bl, PI, B2, P2, B3, P3, is applied via 
the first switch 103 to the frame reordering circuit 101 
which is adapted to reorder it into a reordered digital 

25 video signal of, e.g., II, PI, Bl, P2 , B2, P3, B3 in order 
to derive bidirectionally predicted frame signals for the 
B frames. The reordered digital video signal is then 
provided to a second switch 104a, the first memory device 
120 and the motion compensation device 150 via lines L18, 

30 L12, LI, respectively. In the second sequence coding 
mode, the line L17 is coupled to a line L10 by the first 
switch 103 and the second frame sequence II, Fl, PI, F2 , 
P2, F3, P3 is coupled via the first switch 103 to the 
first memory device 120, the motion compensation device 

35 150 and the second switch 104a on the lines L12, LI, L18, 
respectively. The first switch 103 is actuated by a 



WO 96/29828 



— 

PCT/KR95/00050 



- 12 - 

sequence mode control signal CS1 from a conventional 
system controller, e.g., a microprocessor ( not shown). As 
can be seen from the above, since there is a reordering 
delay in the first sequence coding mode, the second 
5 sequence coding mode may be advantageously used as a low- 
delay mode in such an applications as videophone and 
teleconference devices. 

As shown in Fig. 1, the image coding system includes 
the second switch 104a and a third switch 104b which are 

10 used for selectively performing two frame coding modes: an 
intra frame coding mode and an inter frame coding mode. 
The second and the third switches 104a and 104b, as well 
known in the art, are simultaneously actuated by a frame 
mode control signal CS2 from the system controller • 

15 In the intra frame coding mode, the intra frame II is 

directly coupled as a current frame signal via a line L14 
to an image signal encoder 105, wherein the current frame 
signal is encoded into the set of quantized transform 
coefficients, e.g., by using a discrete cosine transform 

20 (DCT) and any of the known quantization methods. The 
intra frame II is also stored as an original reference 
frame in a frame memory 121 of the first memory device 
120, wherein the first memory device 120 includes three 
frame memories 121, 122 and 123, which are connected to 

25 the motion compensation device 150 through lines L2, L3 
and L4, respectively. Thereafter, the quantized transform 
coefficients are transmitted to an entropy coder 107 and 
an image signal decoder 113. At the entropy coder 107, 
the quantized transform coefficients from the image signal 

30 encoder 105 are coded together by using, e.g., a variable 
length coding technique; and transmitted to a 
transmitter (not shown) for the transmission thereof. 

On the other hand, the image signal decoder 113 
converts the quantized transform coefficients from the 

35 image signal decoder 105 back to a reconstructed intra 
frame signal by employing an inverse quantization and an 
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inverse discrete cosine transform. The reconstructed 
intra frame signal from the image signal decoder 113 are 
then stored as a reconstructed reference frame in a frame 
memory 131 of the second memory device 130, wherein the 
5 second memory device 130 includes three frame memories 
131, 132, 133, which are connected to the motion 
compensation device 150 via lines L'2, L'3, L'4, 
respectively. 

In the inter coding mode, an inter frame (for 
10 example, the predictive frame PI, the bidirectionally 
predictive frame or the forwardly predictive frame Fl) is 
applied as a current signal to the subtractor 102 and the 
motion compensation device 150, and is stored in the frame 
memory 131 of the first memory device 120, wherein the so 
15 called inter frames include the bidirectionally predictive 
frames, Bl, B2, B3, the predictive frames, PI, P2, P3, and 
the forwardly predictive frames, Fl, F2, F3. The original 
reference frame previously stored in the frame memory 121 
is then coupled via the line L2 to the motion compensation 
20 device 150, and shifted or stored in the frame memory 122. 

The motion compensation device 150 includes a block-based 
motion compensation channel and a feature point based 
motion compensation channel as described hereinafter. 
When the current frame is a predictive frame PI, the 
25 current frame signal on the line LI and a reconstructed 
reference frame signal on a line L'l from the frame memory 
131 of the second memory device 130 are processed through 
the use of the block based motion compensation channel to 
predict the current frame with view of generating the 
30 predicted current frame signal onto a line L30 and the set 
of motion vectors onto a line L20. When the current frame 
is the forwardly predictive frame Fl (or bidirectionally 
predictive frame Bl), the current frame signal on the line 
LI, the original reference frame signal on one of the 
35 lines L2 , L3 and L4 from the first memory device 120 and 
the reconstructed reference frame signal on one of the 
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lines L'2, L'3 and L'4 from a second frame memory 130 are 
processed through the use of the feature point based 
motion compensation channel to predict the current frame 
to generate a predicted current frame signal onto a line 
5 L30 and a set of motion vectors onto a line L20. The 
motion compensation device 150 will be described in detail 
with reference to Fig. 3. 

The predicted current frame signal on the line L30 is 
subtracted from a current frame signal on the line LIS at 

10 the subtractor 102, and the resultant data, i.e., an error 
signal denoting the differential pixel value, is 
dispatched to an image signal encoder 105, wherein the 
error signal is encoded into a set of quantized transform 
coefficients, e.g., by using a DCT and any of the known 

15 quantization methods. That is, the errors obtained by 
subtracting the predicted current frame from the current 
frame are DCT-coded. In such case, the quantizer step 
size is set to a large value, in order to compensate only 
the severely distorted region caused by incorrectly 

20 estimated motion vectors. 

Thereafter , the quantized transform coefficient 
transmitted to an entropy coder 107 and an image signal 
decoder 113. At the entropy coder 107, the quantized 
transform coefficients from the image signal encoder 105 

25 and the motion vectors transmitted through the line L20 
from the motion compensation device 150 are coded together 
by using, e.g., a variable length coding technique; and 
transmitted to a transmitter (not shown) for the 
transmission thereof . 

30 On the other hand, the image signal decoder 113 

converts the quantized transform coefficients from the 
image signal decoder 105 back to a reconstructed error 
signal by employing inverse quantization and inverse 
discrete cosine transform. - 
35 The reconstructed error signal from the image signal 

decoder 113 and the predicted current frame signal on the 
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line L16 from the motion compensation device 150 are 
combined via the switch 104b at the adder 115 to thereby 
provide a reconstructed reference frame signal via the 
line L'l to be stored as the previous frame in the second 
5 frame memory 130. The frame; memory device 130 includes/ 
e.g., the three frame memories 131, 132 and 133 which are 
connected in series as shown in Fig. 1. That is, the 
reconstructed frame signal from the adder 115 is first 
stored in, e.g., the frame memory 131, and then provided 

10 to the motion compensation device 150 via the line L'2 and 
also shifted into the second frame memory 132 on a frame- 
by-frame basis if the next reconstructed frame signal from 
the adder 115 is inputted to the first frame memory 131. 
This process is sequentially repeated as long as the image 

15 encoding operation is performed. 

Referring to Figs. 2A and 2B, there are provided 
exemplary diagrams showing the first and second frame 
sequences described above. As shown, when current frame 
is the predictive frame PI, a set of motion vectors SMV1 

20 is obtained on the block-by-block basis by using the 
reconstructed intra frame II as the reference frame 
retrieved from the second frame memory 130. In a similar 
manner, the sets of motion vectors SMV2 and SMV3 for 
current frames P2 and P3 are obtained using reference 

25 frames PI and P2* 

When the current frame is the bidirectionally 
predictive frame Bl, a set of forward motion vectors FMV1 
is obtained from the feature points by using the 
reconstructed reference frame II retrieved from the second 

30 frame memory 130 and the original reference frame II 
retrieved from the first memory 120. In a similar manner, 
the set of backward motion vectors BMV1 for the current 
frame Bl is obtained by using the original reference frame 
PI and the reconstructed reference frame PI. Thereafter, 

35 the image encode system chooses between the set of forward 
motion vectors FMV1 and the set of backward motion vector 
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BMV1 and transmits the corresponding motion vectors . 

When the current frame is the forwardly predictive 
frame Fl, a set of forward motion vectors FMV2 is obtained 
from the feature points by using the original reference 
5 frame II retrieved from the first memory device 120 and 
the reconstructed reference frame Fl retrieved from the 
second memory 130. 

As may be seen from the above, for the motion 
estimation and compensation/ the frames contained in the 
10 first and the second frame sequence are arranged in the 
first and the second frame device 120 and 130 as shown in 
Tables I and II, 

Table I. 

15 



20 





The first frame sequence 


Ll 


11 


PI 


Bl 


P2 


B2 


P3 


B3 


L2 


X 


© 


A 


Bl 


A 


B2 


A 


L3 


X 


X 


© 


© 


Bl 


© 


B2 


L4 


X 


X 


X 


11 




Bl 





25 

Table II. 







The 


second 


frame 


sequence 








Ll 


11 


Fl 


PI 


F2 


P2 


F3 


P3 


30 


L2 


X 


© 


Fl 


© 


F2 


© 


F3 




L3 


X 


X 


© 


Fl 


© 


F2 


© 


35 


L4 


X 


X 


X 


11 


Fl 


PI 


F2 



wherein O indicates a frame used for the forward motion 
estimation and & denotes a frame used for the backward 
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motion estimation. 

As may be seen from the above, the predictive frames 
PI, P2, P3, are reconstructed by using the DCT based 
predictive coding, so called TMN4, employing the block 
5 based motion estimation; and the intervening frames, i.e., 
the bidirectionally predicted frames Bl, B2, B3, or the 
forwardly predicted frames Fl, F2, F3 are reconstructed by 
using an improved feature point based motion compensation- 
discrete cosine transf orm(MC-DCT ) in accordance with the 
10 present invention. 

Referring to Fig. 3, there are illustrated details of 
the motion compensation device 150 shown in Fig. 1. As 
shown in Fig. 3, the motion compensation device 150 
includes input selectors 154, 155 and 156, a block based 
15 motion compensation circuit 151, a first feature point 
based motion compensation circuit 152, a second feature 
point based motion compensation circuit 153, and output 
selectors 157 and 158. 

The block based motion compensation circuit 151 
2 0 employing a conventional block matching algorithm serves 
to detect a set of motion vectors for each of the 
predictive frames PI, P2, P3; and to generate a predicted 
current frame for the corresponding predictive frame. 
Therefore, when the predictive frame PI, as described in 
2 5 Tables I and II, is applied as a current frame to the 
block based motion compensation circuit 151, the selector 
154 serves to couple the reconstructed intra frame II on 
the line L'2 as the reference frame to the block based 
motion compensation circuit 151. At the block based 
30 motion compensation circuit 151, a set of motion vectors 
is estimated and a predicted current frame signal is 
constructed therethrough. Thereafter, the set of motion 
vectors and the predicted current frame signal are 
respectively coupled via the output selectors 157 and 158 
35 on lines L20 and L30. 

The first feature point based motion compensation 
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circuit 152 employing an affine transform as described 
hereinafter serves to detect a set of forwardly estimated 
motion vectors for each of bidirectionally predictive 
frames Bl, B2, B3 or the forwardly predictive frame Fl, 
5 F2, F3 and to generate a predicted current frame for the 
corresponding bidirectionally or forwardly predictive 
frame. Therefore, when the bidirectionally predictive 
frame Bl on the line LI is applied as the current frame to 
the feature point based motion compensation circuit 152, 

10 the selector 155, as shown in Table I, serves to couple 
the original intra frame II on the line I>2 as the original 
reference frame to the feature point based motion 
compensation circuit 152. The selector 156 serves to 
couple the reconstructed intra frame II on the line L'2 as 

15 the reconstructed reference frame to the feature point 
based motion compensation circuit 152 to generate the 
predicted frame. At the first feature point based motion 
compensation circuit 152, a set of forwardly estimated 
motion vectors is estimated by using the reconstructed and 

20 the original reference frames, and a predicted current 
frame signal is constructed by using the reconstructed 
reference frame. Thereafter, the set of forwardly 
estimated motion vectors and the predicted current frame 
signal are respectively coupled via the output selectors 

25 157 and 158 on lines L20 and L30, wherein the output 
selectors 157 and 158 is controlled by control signal CSS 
and CS6 from the system controller (not shown). 

The second feature point based motion compensation 
circuit 153 employing an affine transform described 

30 hereinafter serves to detect the set of backwardly 
estimated motion vectors for each of the bidirectionally 
predictive frames Bl, B2, B3 and to generate a predicted 
current frame for the corresponding bidirectional 
predictive frame. Therefore, when the bidirectionally 

35 predictive frame Bl is applied as the current frame to the 
second feature point based motion compensation circuit 
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153, the original predictive frame PI on the line L2 is 
coupled as the original reference frame to the feature 
point based motion compensation circuit 153 and the 
reconstructed predictive, frame PI on the line L'2 is 
5 coupled as the reconstructed reference frame to the second 
feature point based motion compensation circuit 153. At 
the second feature point based motion compensation circuit 
153, a set of backwardly estimated motion vectors is 
obtained by using the reconstructed and the original 

10 reference frames, and a predicted current frame signal is 
constructed by using the reconstructed reference frame. 
Thereafter, the set of backwardly estimated motion vectors 
and the predicted current frame signal are respectively 
coupled via the output selector 157 and 158 on lines L20 

15 and L30. 

Referring to Fig- 4, there are illustrated details of 
the feature point based motion compensation circuit shown 
in Fig. 3. A reconstructed reference frame signal on the 
line L r 2 from the second frame memory 130 is inputted to 

20 a feature point selection block 210 for generating a set 
of feature points, and a motion compensation block 240. 
The set of feature points is then coupled to the motion 
vector search block 230 and the motion compensation block 
240. The motion vector search block 230 receives the 

25 original reference frame and the current frame and serves 
to generate a set of motion vectors for the set of feature 
points. The set of motion vectors is coupled . to the 
motion compensation block 240 which serves to generate a 
predicted current frame based on the set of motion vector 

30 and the set of feature points. 

At the feature point selection block 210, the set of 
feature points are selected from a multiplicity of pixels 
contained in the reconstructed reference frame, each of 
the feature points is defined in terms of the position of 

35 a pixel. An exemplary current frame and a reconstructed 
reference frame are shown in Figs. 5A and 5B. 
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Referring to Figs. 6A to 6E, there are shown 
explanatory diagrams depicting a feature point selecting 
process in accordance with the present invention. As 
shown in Fig. 6A, edges are detected in the reconstructed 

5 reference frame P^ x tY) shown in Fig. 5B, by using a 
known Sobel edge detector (see, e.g, A . K . Jain, 
"Fundamentals of Digital Image Processing", 1989, 

Prentice-Hall International). The output \ v P{x*y}\ from 
the Sobel operator is compared with a predetermined 
10 threshold Te . The predetermined threshold Te is 
preferably selected as 6 in accordance with the present 

invention. If the output value l v P( x //) I. from the Sobel 
operator is less than the predetermined threshold re, the 
output value l v P( x 'yH is set to 0. Otherwise, the output 
15 value |y£(*#y) I may be unchanged. Therefore, an edge 

image signal egtety) shown in Fig. 6A is defined as 
follows : 

(0, if\vp(x, y) \<Te 
eg{x,y) ={ 

[\vp{x,y) \, otherwise 

20 In a preferred embodiment of the present invention, 

the feature points are determined by using a grid 
technique employing a hexagonal grid having a plurality of 
overlapping hexagons as shown in Fig. 6B. As shown in 
Figure 6C, a hexagon 610 is defined by line segments 

25 connecting seven grid points 611 to 617. The grid point 
617 contained in a hexagon 610 encompasses more 
neighboring grid points 611 to 616 than a tetragon, 
thereby allowing the feature points to be more effectively 
organized. The hexagon 610 includes six non-overlapping 

30 triangles 621 to 626 and the grid points 611 to 617 are 
the vertices of triangles 621 to 626. The resolution of 
the hexagon 610 is defined by lines HH and HV, which, in 
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accordance with the present invention, are preferably set 
to 13 and 10 , respectively. 

Referring to Fig. 6D, for each of the grid points, 
e.g., Gl to G4, non-overlapping searching ranges, e.g., 
5 SRI to SR4 is set. An edge point, e.g., E7 located in the 
searching range SRI becomes a feature point for the grid 
point ,e.g., Gl , if the summation value of eight pixels 
surrounding the edge point, e.g., E7 is maximum. 
Therefore, the feature point Di may be represented as 
10 follows: 




Eq.2 



wherein EG(x,y) i s a value of the edge point contained in 
the search region SRi and i is a positive integer. 

15 The set of feature points is determined by using 

Eq. 2 wherein the set of feature points includes a grid 
point overlapping on an edge point, an edge point being 
located in the non-overlapping searching region SRi and 
having the maximum summation value of its surrounding 

20 pixel points, and said grid point having no edge point 
contained in its non-overlapping searching range. 

If more than one edge point with the equal maximum 
summation value exist, then the edge point nearest to the 
grid point is selected as a feature point. 

25 When the set of feature points is determined, the 

hexagonal grids shown in Fig. 6B is deformed as a 
hexagonal feature point grid shown in Fig. 6E. After the 
hexagonal feature point grid is determined, the set of 
feature points is coupled to the motion vector search 

30 block 230 shown in Fig. 4 which serves to detect a set of 
motion vectors thereof. In accordance with the present 
invention, a convergence process employing an affine 
transform is used for searching the set of motion vectors. 
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Referring to Figs. 7A and 7B, there is an exemplary 
diagram illustrating the motion vector searching process 
in accordance with the present invention. A set of quasi- 
feature point is determined in the current frame by using 
5 the set of feature points wherein each of the feature 
points in the reconstructed reference frame is mapped to 
the corresponding quasi-f eature point in the current 
frame. For each of the quasi-f eature points, e.g., Dl to 
D30, the initial motion vector is set to (0,0). 

10 When a quasi-f eature point, e.g., D7 is then assigned 

or established as a subject quasi-f eature point to be 
processed for estimating its motion vector, a subject 
current polygon 700 is used in the convergence process. 
The subject current polygon 700 is defined by line 

15 segments connecting the subject quasi-f eature points D7 
and its neighboring quasi-f eature points , e.g., Dl to D6 
which surround the subject quasi-f eature point D7 . The 
current polygon 700 includes six non-overlapping triangles 
701 to 706, wherein the subject quasi-f eature point is 

20 located on a common vertex of the triangles. 

A predetermined number of candidate motion vector are 
sequentially then added to the initial motion vector of 
the quasi-f eature point D7, wherein the predetermined 
number of candidate motion vectors are selected preferably 

25 in the range from 0 to ±7 , horizontally and vertically, 
and the candidate motion vector D7Y1 is not allowed since 
the triangle 701 is reversed. A candidate motion vector 
D7X1 is added to the initial vector of the subject quasi- 
feature point D7 without changing the initial motion 

30 vectors of its six neighboring feature points Dl to D6 in 
order to produce a updated initial motion vector D7D'7. 
Therefore, the updated initial motion vector D7D'7 
represents a displacement between the subject quasi- 
f eature point D7 and a candidate quasi-f eature point D'7. 

35 A predicted position for each of the pixels contained, 

in the subject current polygon 700 is determined on the 
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original reference frame by using the updated initial 
motion vector and the initial vectors of the neighboring 
quasi-f eature points.- Thereafter, each of the pixel 
positions contained in the subject current polygon 700 is 
5 interpolated by a pixel value on the original reference 
frame corresponding to the predicted position to form a 
predicted subject current polygon. In accordance with a 
preferred embodiment of the present invention, this 
process is performed by a known affine transform at each 
10 of the triangles, e.g., 701 which has the three feature 
point, e.g., Dl , D2 , D7, as its vertices. The affine 
transform is defined as follows: 




Eg. 3 



15 

wherein (x,y) represents the x and y coordinates of a 
pixel within the predicted subject current polygon; 
(x',y') denotes the coordinates of a predicted position on 
the original reference frame; and a to f are affine 

20 transform coefficients. 

The six mapping parameters a, b, c, d, e, f are 
uniquely determined by using the motion vectors of the 
three quasi-f eature points, e.g., Dl , D2 , D7 . Once the 
affine transform coefficients are known, each of the 

25 remaining pixels in the triangle 701 can be mapped onto a 
position in the original reference frame. Because the 
obtained predicted position (x',y') of the original 
reference frame is not a set of integers in most cases, a 
known bilinear interpolation technique is used to 

30 calculate the interpolated gray level at the predicted 
position (x',y')* The affine mapping process is applied 
to the triangles 701 to 706, independently. The predicted 
subject current polygon for the candidate motion vector 
then is obtained. 
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The predicted subject current hexagon is then 
compared with the current hexagon 700 and it is checked if 
a peak signal to noise ratio(PSNR) of the predicted 
subject current hexagon and the current hexagon is 
5 increased. If this is the case, the initial motion vector 
(0,0) of the subject quasi-f eature point D7 is updated 
with the updated initial motion vector D7D'7. 

For remaining candidate motion vectors, the process 
is repeated. The above process is also performed at all 
10 of the quasi-f eature points contained in said current 
frame in one iteration. 

Referring to Fig. 7B, assuming that the one iteration 
is completed, the quasi-f eature point D7 is. set to a 
subject quasi-f eature point; and the updated initial 
15 motion vectors for the neighboring quasi-f eature po ints D l 
to D6 are D1D'2, D2D'2, D3D'3, D4D'4, DSD '5 and D6D ' 6 ;. 
and, in a similar manner, the predetermined candidate 
motion vectors are sequentially added to the initial 
vector of the subject quasi-f eature point D7D'7. For 
20 example, the candidate motion vector D'7X2 is added to the 
initial vector of the subject quasi-f eature point D7D'7 
without changing the initial motion vectors of its six 
neighboring feature points, DID'l, D2D'2, D3D'3, D4D'4, 
D5D' 5 , D6D ' 6 . Therefore, the updated initial motion 
25 vector becomes D7X2 . The predetermined number of 
candidate motion vectors, as described above, are selected 
preferably in the range from 0 to ±7, horizontally and 
vertically, and the candidate motion vector D7Y2 is not 
allowed since the triangle 701 is reversed. 
30 A predicted position for each of the pixels contained 

in the subject current polygon 700 is determined on the 
original reference frame by using the updated motion 
vector D7X2 and the initial vectors of the neighboring 
quasi-f eature points DID'l, D2D'2, D3D'3, D4D'4, D5D'5, 
35 D6D' 6 . Thereafter, each of the pixel positions contained 
in the subject current polygon 700 is interpolated by a 
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pixel value on the original reference frame corresponding 
to the predicted position to from a predicted subject 
current polygon 700' (represented by a phantom line shown 
in Fig. 7B) - 

5 The predicted subject current hexagon 700' is then 

compared with the current hexagon and it is checked if the 
PSNR of the predicted subject current hexagon and the 
current hexagon is increased. If this is the case, the 
initial motion vector of the subject quasi-f eature point 
10 DID ' 7 is updated with the updated initial motion vector 
D7X2 . 

For remaining candidate motion vectors, the process 
is repeated. The above process is also performed at all 
of the quasi-f eature points contained in the current frame 
15 in a second iteration. 

This process is also performed with respect to all of 
the quasi-f eature points several times until the 
convergence is reached. Preferably, the iteration for the 
process is set to five times, because, in the most cases, 
20 the motion vectors converge before the 5th iteration. 

As can be seen from the above, in the convergence 
process, a displacement of each of the feature points are 
given to a motion vector thereof and the six triangles of 
each of the hexagon are af fine-transformed independently 
25 using the displacements of their vertex feature points. 

If the displacement provides a better PSNR, the motion 
vector of the subject feature point is sequentially 
updated. Therefore, the convergence process is very 
efficient in the matching process to determine the 
30 predicted image as close as possible to the original image 
having zooming, rotation or scaling objects. 

In accordance with a preferred embodiment of the 
present invention, for hardware implementation, this 
process can be accomplished in three steps. The quasi- 
35 feature points denoted as Dl, D3, and D5 shown in Fig. 7A, 
which form non-overlapping subject current polygons, are 
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first processed simultaneously by using each of the 6 
neighboring feature points (D2, D7 , D6, D10, Dll, D17), 
(D2, D4, D7, D12, D13, D19), <D4, D6, D7 , D8 , D9 , D15). 
The same process is repeated next for the points D2 , D4 
5 and D6 . As the last step, the remaining points D7 , D8 and 
D9 are finally processed. 

Referring back to Fig. 4, the obtained motion vectors 
for all of the quasi-f eature points are then coupled as 
the set of motion vector for all of the feature points to 

10 the motion compensation block 240 which serves to generate 
a predicted current frame signal through the use of the 
reconstructed reference frame. That is, the predicted 
current frame signal is obtained by the affine transform 
employing the reconstructed previous frame and the 

15 obtained motion vectors. It can be seen from the above, 
this is the same mapping using the affine transform used 
for the motion vector search process, except that the 
reconstructed reference frame is used instead of the 
original reference frame, because a decoder system (not 

20 shown) has only a reconstructed reference frames- Q 
the other hand, since the encoding system employing this 
feature point based motion compensation produces a 
considerably good image with the motion vectors only, the 
difference or error signal between the current frame and 

2 5 the predicted current frame may not be transmitted. 

As may be seen from the above, it is readily 
appreciated that the inventive encoder system employing 
the feature point based motion compensation can obtain a 
reliable set of motion vectors, thereby improving the 

30 coding efficiency. 

The feature point based motion compensation algorithm 
is based on image features, and affine transformation is 
employed to compensate for rotation and zooming of the 
object. In most cases, the motion compensated images have 
35 a higher PSNR with good subjective quality. If the motion 
prediction fails in cases of a large scale motion, the 
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error image can be coded and transmitted using DCT with a 
large quantization step. . Specifically, a good subjective 
quality is obtained by using the inventive encoding system 
at 24 Kbps- Further, since the positions of feature 
5 points change from frame to frame, the inventive encoder 
system employs, as the reference frame, a reconstructed 
previous frame which exists both in the encoder and in the 
decoder so that it is not necessary to transmit the 
position information of feature points. Furthermore, this 

10 pixelwise motion compensation employed in the present 
encoding system produces a better subjective quality than 
the block based motion compensation, because it can 
compensate the zooming, rotation and scaling of objects by 
using the affine transform with the motion vectors only. 

15 While the present invention has been shown and 

described with reference to the particular embodiments, it 
will be apparent to those skilled in the art that many 
changes and modifications may be made without departing 
from the spirit and scope of the invention as defined in 

20 the appended claims. 
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What is claimed is: 

1 . A method for detecting a set of motion vectors 
between a current frame and a reference frame of video 
5 signals by employing a feature point based motion 
estimation approach, wherein the reference frame includes 
a reconstructed reference frame and an original reference 
frame, which comprises the steps of: 

(a) selecting a set of feature points from pixels 
10 contained in the reconstructed reference frame wherein the 

set of feature points forms a polygonal grid having a 
plurality of overlapping polygons; 

(b) determining a set of quasi-f eature points on the 
current frame based on the set of feature points; 

15 (c) assigning a set of initial motion vectors for the 

set of quasi-f eature points, wherein each of the initial 
motion vectors is set to ( 0 , 0 ) ; 

(d) appointing one of the quasi-f eature points as a 
subject quasi-f eature point, wherein the subject quasi- 

20 feature point has N number of neighboring quasi-f eature 
points which form a subject current polygon defined by 
line segments connecting the subject quasi-f eature point 
and said N number of neighboring quasi-f eature points, N 
being a positive integer; 

25 (e) sequentially adding the initial motion vector of 

the subject quasi-f eature point to M number of candidate 
motion, vectors with to produce M number of updated initial 
motion vectors, M being a positive integer, wherein said 
M number of candidate motion vectors cover a predetermined 

30 region in the subject current polygon and the initial 
motion vectors of said neighboring feature points are 
fixed; 

(f ) determining a predicted position on the original 
reference frame for each pixel contained in the subject 
35 current polygon based on each of the M number of updated 
initial motion vectors for the subject quasi-f eature point 
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and said N number of the initial motion vectors of the 
neighboring quasi-f eature points; 

(g) providing a predicted pixel value for said each 
pixel based on the predicted position from the original 

5 reference frame to form M number of predictive subject 
current polygons; 

(h) calculating the difference between the current 
polygon and each of the predicted subject current polygons 
to produce M number of peak signal to noise ratios (PSNR ' s ) 

10 (i) selecting one of the updated motion vectors as a 

selected updated motion vector, which entails a predicted 
subject current polygon having a maximum PSNR, to update 
the initial motion vector of the subject quasi-f eature 
point with the selected updated motion vector; 

15 (j) repeating the steps (d) to (i) until all of the 

initial motion vectors are updated; 

(k) repeating the step (j) until said repeating is 
carried out for a predetermined number of times; and 

(n) establishing the set of initial vectors as the 

20 set of motion vectors, to thereby determine the set of 
motion vectors. 

2. The method as recited in claim 1, wherein the step 
(a) includes the steps of: 
25 (al) detecting an edge image of the reconstructed 

reference frame, wherein the edge image ©£7(*#y) i s 
defined as follows 

(0, if\vp{x,y) \<Te 
eg(x,y) ^ 

[\vp(x, y) \, otherwise 

30 

where P( x *y) represents the reference f rame; I V P y) I 

denotes an output from a known Sobel operator; and Te is 
a predetermined threshold; 
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(a2) establishing a polygonal grid on the edge image 
wherein the polygonal grid includes a number of grid 
points to form the plurality of overlapping polygons; 

(a3) assigning a non-overlapping search range for 
5 each of the grid points; and 

(a4) determining the set of feature points wherein 
the set of feature points includes a grid point 
overlapping an edge point, said edge point being located 
in the non-overlapping search range and having a maximum 
10 summation value of its surrounding pixel points, and said 
grid point having no edge point contained in its non- 
overlapping search range. 

3. The method as recited in claim 3, wherein the set of 
15 feature points includes an edge point nearest to the 

polygonal grid when more than one edge point having the 
equal maximum summation value appear in the search range. 

4. The method as recited in claim 3, wherein the polygon 
20 is a hexagon and N is 6. 

5. The method as recited in claim 4, wherein the subject 
current hexagon includes six triangles defined by line 
segments connecting the subject quasi-f eature point; and 

25 its neighboring quasi-f eature points and the steps 

(f ) and (g) are performed by using a known affine 
transform. 

6. The method as recited in claim 5, wherein the number 
30 of surrounding pixel points is 8; the predetermined 

repeating number is 5; and the predetermined threshold is 
6. 

7. The method as recited in claim 6, wherein the 
35 predetermined region is in a range from 0 to ± 7, 

horizontally and vertically. 
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8. The method as recited in claim 7, wherein the feature 
point Di is defined as follows 



the search region and i is a positive integer. 

9. An apparatus, for use in a video encoding system, for 
detecting a set of motion vectors between a current frame 
and a reference frame of video signals by employing a 
feature point based motion estimation, wherein the 
reference frame includes a reconstructed reference frame 
and an original reference frame, which comprises: 

first selection means for selecting a set of pixels 
from the reconstructed reference frame as a set of 
feature points, wherein the set of feature points forms a 
polygonal grid having a plurality of overlapping polygons; 

means for determining a set of quasi-f eature points 
on the current frame corresponding to the set of feature 
points; 

memory means for storing a set of initial motion 
vectors for the set of quasi-f eature points, wherein each 
of the initial motion vectors is set to (0,0); 

second selection means for selecting L number of 
subject quasi-f eature points from the set of quasi-f eature 
points, wherein each of the subject quasi-f eature points 
has N number of neighboring quasi-f eature points which 
form a non-overlapping subject current polygon defined by 
line segments connecting the subject quasi-f eature point 
and said N number of neighboring quasi-f eature points, 
said L and N being positive integers; 




wherein EG{x t y) ± s a value of the edge point contained in 
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adder means for adding the initial motion vector of 
each of the subject quasi-f eature points to M number of 
candidate motion vectors to generate M number of updated 
initial motion vectors for each of the subject quasi- 
5 feature points, M being a positive integer, wherein said 
M number of candidate motion vectors cover a predetermined 
region in each of the non-overlapping subject current 
polygons and the initial motion vectors of the neighboring 
feature points for each of the subject quasi-f eature 

10 points are fixed; 

means for determining a predicted position on the 
original reference frame for each pixel contained in each 
of the non-overlapping subject current polygons based on 
each of the updated initial motion vectors and the initial 

15 motion vectors of the corresponding neighboring quasi- 
f eature points; 

means for obtaining a predicted pixel value from the 
original reference frame based on the predicted position 
to thereby form M number of predicted subject current 

20 polygons for each of the non-overlapping subject current 
polygons; 

means for calculating the differences between each of 
the non-overlapping subject current polygons and the 
corresponding M number of predicted subject current 

25 polygons to produce M number of peak signal to noise 
ratios (PSNR' s ) for each of the non-overlapping subject 
current polygons; 

third selection means for selecting one of the 
updated initial vectors, for each of the subject quasi- 

30 feature points, as a selected updated initial motion 
vector which entails the predicted subject current polygon 
having a maximum PSNR to produce L number of selected 
updated initial motion vectors; 

means for updating the initial motion vector for each 

35 of the subject quasi-f eature points stored in the memory 
means with the corresponding selected updated initial 
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motion vector; and 

means for retrieving the set of initial motion 
vectors from the memory means as the set of motion vectors 
when all of the initial motion vectors are updated by a 
5 predetermined number of times. 

10. The apparatus as recited in claim 9, wherein the 
first selection means includes: 

means for detecting an edge image of the 
10 reconstructed reference frame, wherein the edge image 
eg(x,y) ^ B defined as follows 

p, if\vp{x,y) \<Te 
eg(x,y) H 

\\vp{x,y) |, otherwise 



15 where £(x,y) represents the reference frame; l v P(*/ y) I 

denotes an output from a known Sobel operator; and Te is 
a predetermined threshold; 

means for providing a polygonal grid on the edge 
image wherein the polygonal grid includes a number of grid 
20 points to form the plurality of overlapping polygons; 

means for establishing a non-overlapping search range 
for each of the grid points; and 

means for determining the set of feature points 
wherein the set of feature points includes a grid point 
25 overlapping an edge point, said edge point being located 
in the search rangeand having a maximum summation value of 
its surrounding pixel points, said grid point having no 
edge point contained in its non-overlapping searching 
range . 

30 

11. The method as recited in claim 10, wherein the set of 
feature points includes an edge point nearest to the 
polygonal grid when more than one edge point having the 
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equal maximum summation value appear in the search range. 

12. The apparatus as recited in claim 11, wherein the 
polygon is a hexagon and N is 6. 

5 

13. The apparatus as recited in claim 12, wherein the 
subject current hexagon includes six triangles defined by 
line segments connecting the subject quasi^-f eature point 
and its neighboring quasi-f eature points; and means for 

10 determining the predicted position includes a known af f ine 
transformer . 

14. The apparatus as recited in claim 13, wherein the 
number of surrounding pixel points is 8; the predetermined 

15 repeating number is 5; and the predetermined threshold is 
6. 

15. The apparatus as recited in claim 14, wherein the 
predetermined region is in a range from 0 to ± 7, 

20 horizontally and vertically. 

16. An apparatus for encoding a digital video signal to 
reduce the transmission rate of the digital video signal, 
said digital video signal having a plurality of frames 

25 including a current frame and a reference frame, which 
comprises : 

first memory means for storing a reconstructed 
reference frame of the digital video signal; 

second memory means for storing an original reference 
30 frame of the digital video signal; 

first motion compensation means for detecting a 
number of motion vectors between the current frame and the 
reconstructed reference frame by using a block based 
motion estimation and for generating a first predicted 
3 5 current frame based on the number of motion vectors and 
the reconstructed reference frame; 
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second motion compensation means for selecting a set 
of feature points from the reconstructed reference frame 
to detect a set of motion vectors between the current 
frame and the original reference frame corresponding to 
5 the set of feature points by using a feature point based 
motion estimation, and for generating a second predicted 
frame based on the set of motion vectors and the 
reconstructed reference frame; 

means for selectively providing the number of motion 
10 vectors and the first predicted current frame or the set 
of motion vectors and the second predicted current frame 
as selected motion vectors and the predicted current 
frame; 

means for transform-coding an error signal 
15 representing the difference between the predicted current 
frame and the current frame to produce a transform coded 
error signal; and 

means for statistically coding the transform coded 
error signal and the selected motion vectors to produce an 
20 encoded video signal to be transmitted. 

17. The apparatus as recited . in claim 16, wherein the 
second motion compensation means includes : 

first selection means for selecting a set of pixels 
25 from the reconstructed reference frame as a set of 
feature points, wherein the set of feature points forms a 
polygonal grid having a plurality of overlapping polygons; 

means for determining a set of quasi-f eature points 
on the current frame corresponding to the set of feature 
30 points; 

memory means for storing a set of initial motion 
vectors for the set of quasi-f eature points, wherein each 
of the initial motion vectors is set to (0,0); 

second selection means for selecting L number of 
35 subject quasi-f eature points from the set of quasi-f eature 
points, wherein each of the subject quasi-f eature points 
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has N number of neighboring quasi-f eature points which 
form a non-overlapping subject current polygon defined by 
line segments connecting the subject quasi-f eature point 
and said N number of neighboring quasi-f eature points, 
5 said Li and N being positive integers; 

adder means for adding the initial motion vector of 
each of the subject quasi-f eature points to M number of 
candidate motion vectors to generate M number of updated 
initial motion vectors for each of the subject quasi- 

10 feature points, M being a positive integer, wherein said 
M number of candidate motion vectors cover a predetermined 
region in each of the non-overlapping subject current 
polygons and the initial motion vectors of the neighboring 
feature points for each of the subject quasi-f eature 

15 points are fixed; 

means for determining a predicted position on the 
original reference frame for each pixel contained in each 
of the non-overlapping subject current polygons based on 
each of the updated initial motion vectors and the initial 

2 0 motion vectors of the corresponding neighboring quasi- 
f eature points; 

means for obtaining a predicted pixel value from the 
original reference frame based on the predicted position 
to thereby form M number of predicted subject current 

25 polygons for each of the non-overlapping subject current 
polygons; 

means for calculating the differences between, each of 
the non-overlapping subject current polygons and the 
corresponding M number of predicted subject current 

30 polygons to produce M number of peak signal to noise 
ratios (PSNR's ) for each of the non-overlapping subject 
current polygons; 

third selection means for selecting one of the 
updated initial vectors, for each of the subject quasi- 

35 feature points, as a selected updated initial motion 
vector which entails the predicted subject current polygon 
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having a maximum PSNR to produce L number of selected 
updated initial motion vectors; 

means for updating the initial motion vector for each 
of the subject quasi-f eature points stored in the memory 
5 means with the corresponding selected updated initial 
motion vector; and 

means for retrieving the set of initial motion 
vectors from the memory means as the set of motion vectors 
when all of the initial motion vectors are updated by a 
10 predetermined number of times, 

18. The apparatus as recited in claim 17 , wherein the 
first selection means includes: 

means for detecting an edge image of the 
15 reconstructed reference frame, wherein the edge image 
eg{x,y) ± s defined as follows 

(0, if\vp{x,y) \<Te 
eg(x,y) ={ . 

l|vj3(x,y) J , otherwise 



20 where P(*#y> represents the reference f rame; i v P^ x '3^) I 

denotes an output from a known Sobel operator; and ^ e is 
a predetermined threshold; 

means for providing a polygonal grid on the edge 
image wherein the polygonal grid includes a number of grid 
25 points to form the plurality of overlapping polygons; 

means for setting a non-overlapping search range for 
each of the grid points; and 

means for determining the set of feature points 
wherein the set of feature points includes a grid point 
30 overlapping an edge point, said edge point being located 
in the search range and haying a maximum summation value 
of its surrounding pixel points, and said grid point 
having no edge point contained in its non-overlapping 
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search range. 

19, The apparatus as recited in claim 18, wherein the 
polygon is a hexagon, N is 6, the subject current hexagon 
5 includes six triangles defined by line segments connecting 
the subject quasi-f eature point and its neighboring quasi- 
feature points; and means for determining the predicted 
position includes a known affine transformer. 

10 20. The apparatus as recited in claim 19, wherein the 
number of surrounding pixel points is 8, the predetermined 
repeating number is 5; and the predetermined threshold is 
6 . 

15 21. The apparatus as recited in claim 20, wherein the 
predetermined region is in a range from 0 to ± 7 , 
horizontally and vertically. 
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